Add EWMA and load biasing crates for failure-aware P2C balancing by unleashed · Pull Request #4537 · linkerd/linkerd2-proxy

unleashed · 2026-05-21T18:36:26Z

Today the proxy's P2C load balancer uses Tower's PeakEwma, which tracks
only round-trip time. An endpoint returning fast 503s or 429s looks
"fast" to PeakEwma, so P2C keeps routing traffic to it. This is exactly the
opposite of what operators want.

This PR adds the building blocks to make P2C failure-aware, but does not
wire anything in the proxy stack yet to keep the reviews' scope manageable.
Follow-up PR's will make use of these building blocks to activate this code
and implement related features in the circuit breaker.

Here are the main components:

linkerd-ewma. A standalone EWMA crate that supports non-mutating
time-projected reads and records externally supplied peak values.
Tower's internal RttEstimate is private, mutates on read, and samples
only elapsed request time, so it cannot provide the read-only load
projection or the injected failure penalties that failure-aware load
balancing needs.
retry_after module in linkerd-http-classify. Parsers for HTTP
Retry-After (delay-seconds and HTTP-date per RFC 7231) and gRPC
grpc-retry-pushback-ms (per gRPC A6 spec), so the load biaser and the
upcoming circuit breaker can honor server backoff hints.
linkerd-load-biaser. A Tower Service wrapper implementing
tower::load::Load that tracks per-endpoint RTT via EWMA and injects
temporary load penalties on failure responses (HTTP 429/503/5xx, gRPC
RESOURCE_EXHAUSTED/UNAVAILABLE). When a Retry-After hint is present the
penalty is amplified to remain meaningful through the server-requested
backoff window. The load metric is rtt * (pending + 1), giving P2C
the ability to steer traffic away from unhealthy endpoints while preserving
the same behavior as PeakEwma when all of them are healthy.

Introduce linkerd-ewma, a general-purpose exponentially-weighted moving average crate. The crate provides five public methods on an Ewma struct: new (initializes with INFINITY sentinel), get (returns stored value), add (blends a new sample using exponential decay), add_peak (replaces stored value when the new sample exceeds it), and add_rate (derives a rate from the inverse of the elapsed interval and feeds it through add). This is being added in spite of tower::PeakEwma because this is not limited to middleware-based RTT computing. We specifically plan to use this implementation for a load biasing feature and a success-rate circuit breaker policy, which would otherwise not be possible. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Extend linkerd-ewma with the API surface needed for success-rate circuit breaking. A MIN_DECAY constant (1 ms) is now applied in both constructors so that a zero-duration decay never produces division-by-zero or NaN results in downstream arithmetic. New methods: new_with_value sets an explicit initial sample instead of the INFINITY sentinel, reset overwrites both value and timestamp for breaker recovery, and get_at projects the stored value forward through exponential decay without mutating internal state. Also add_peak is now decay-aware: it projects the stored value to the candidate timestamp before deciding whether to replace it, and it unconditionally replaces INFINITY so that the first real sample always takes effect even at the construction timestamp. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a retry_after module to linkerd-http-classify with shared parsing functions for extracting backoff hints from HTTP and gRPC responses. parse_retry_after handles 429/503 responses with both delay-seconds and HTTP-date formats per RFC 7231, capping the returned duration at a caller-specified maximum. parse_grpc_retry_pushback reads the grpc-retry-pushback-ms header per the gRPC A6 spec, rejecting negative values and capping positive ones. We use the httpdate crate for the actual RFC 7231 HTTP-date parsing. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…re penalties Introduce the linkerd-load-biaser crate, which wraps any tower::Service to provide per-endpoint load metrics for P2C balancing. The crate tracks request latency via EWMA and injects penalties when failure responses are detected, steering traffic away from unhealthy endpoints. Penalty injection covers HTTP 429/503/5xx and gRPC RESOURCE_EXHAUSTED/UNAVAILABLE trailers-only responses (not streaming gRPC failures since we can only access headers here). For responses with backoff hints, Retry-After on HTTP 429/503 or grpc-retry-pushback-ms on gRPC trailers-only errors, the penalty is amplified so that the EWMA value remains meaningful through the server-requested backoff window. The amplification is clamped to prevent infinity from permanently disabling the endpoint. The load metric is computed as `max(rtt * (pending + 1), penalty)`, where `rtt` is the peak-EWMA latency, and `pending` is the number of in-flight requests. This is returned via tower::load::Load for direct P2C integration. The load biaser is disabled by default, preserving RTT-only behavior (PeakEwma equivalent), unless explicitly activated. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

These cover the complete load biasing lifecycle, including penalty injection, hint parsing, cancellation safety via PinnedDrop, and backwards-compatible behavior when disabled (ie. RTT-only behavior equivalent to PeakEwma). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

raykroeker

@unleashed Thanks for the documentation. It really helps understand the intent.
+100

cratelyn

thanks for breaking these additions out into a standalone pull request, separate from the changes we'll be making in our proxy stack(s). that really helped expedite review of this.

Co-authored-by: katelyn martin <kate@buoyant.io>

…_rate_limit_hint The _max parameter was accepted for API symmetry with rate_limit_hint(max) but intentionally unused: the method always caches the uncapped raw value so each consumer can apply its own cap via rate_limit_hint(max). Removing the parameter for now since we probably won't need it in the future, and if so we can always put it back in place. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…or and accessor Make the inner Duration field private and provide CachedRateLimitHint::new() for construction and duration_capped(max) for reads. This prevents consumers from bypassing the per-caller cap that rate_limit_hint(max) enforces, since the cached value is intentionally uncapped. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Explain why a standalone EWMA crate exists instead of using Tower's RttEstimate: it is private, mutates on read, and cannot support the penalty dimension that failure-aware load balancing requires. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The crate only uses tokio::time, so disable the default feature set to avoid pulling unnecessary features into the dependency declaration. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The cancellation test uses tokio::sync::oneshot which requires the sync feature. This compiled only because workspace feature unification pulled it in from other crates. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Replace raw string literals with the module-level constant for consistency with how HTTP tests use http::header::RETRY_AFTER. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Consistent with Ewma::new which already has this attribute. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

the shape of this looks good, but i want to hold off on merging it until we have consensus about load biasing and changes to the control plane.

Inspect the grpc-status header only on HTTP 200 responses whose content-type starts with application/grpc. Without this a non-gRPC upstream that happens to include a grpc-status header would be considered a gRPC failure and penalized by the load biaser. The same check is applied to the gRPC retry-pushback-ms parsing in the ReponseFailureHint trait implementation. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Up until now we mapped every non-zero gRPC status code to FailureHint::InternalError, penalizing client errors like CANCELLED, INVALID_ARGUMENT, NOT_FOUND, etc. These don't indicate server health issues and should not steer traffic away from the endpoint. Restrict penalty injection to server-side error codes that indicate endpoint problems: UNKNOWN (2), DEADLINE_EXCEEDED (4), INTERNAL (13), and DATA_LOSS (15), alongside the existing RESOURCE_EXHAUSTED (8) and UNAVAILABLE (14) statuses. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Ensure only those gRPC status codes indicating server-side errors inject penalties. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Verify that consecutive 429 responses at 1s intervals keep the penalty at the configured level, confirming the EWMA peak resets the decayed value rather than accumulating. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a `last_update()` getter that returns the timestamp of the most recent EWMA update. Callers that need to detect staleness (ie. idle periods where the EWMA has decayed to the point that a single sample dominates) can compare this against the current time to detect this exact circumstance (and, for example, require more samples before taking decisions). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

see: * https://github.com/linkerd/linkerd2-proxy-api/releases/tag/v0.20.0 * linkerd/linkerd2-proxy-api#562 * linkerd/linkerd2-proxy-api#559 * linkerd/linkerd2-proxy-api#565 * #4537 * #4546 * #4544 this commit updates linkerd2-proxy-api, pulling in the latest release. aside from a slew of dependency updates, this new version most importantly includes changes to set the state for forthcoming load-balancing work. these new fields are marked with "todo" comments to indicate where future work in `linkerd-proxy-client-policy` in that vein will introduce future enum variants, fields, marshalling and validation, etc. this new version also includes an update to the most recent version of rand, which will allow us to properly update to the latest version of hickory-resolver without introducing breakage in our audit job in CI. Signed-off-by: katelyn martin <kate@buoyant.io>

see #4537. this commit explores using `Arc::strong_count` for tracking the number of pending requests. some tests fail as a result of this change, which i haven't fully tracked down. Signed-off-by: katelyn martin <kate@buoyant.io>

- Drop unused add_rate, last_update - Correct MIN_DECAY enforcement comment - Note on ignoring negative do-not-retry pushbacks Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…A RTT We now now keep a single RTT EWMA and a load of `rtt * (pending + 1)`, exactly like Tower's PeakEwma. A success records its measured RTT, while a failure now records a computed effective RTT through the same peak-EWMA logic, using the Retry-After or grpc-retry-pushback hint when present, or otherwise penalizing the RTT with a base value. In-flight requests are now counted the way Tower's PeakEwma counts them, using Arc's strong count and measuring on cancellation. Finally an explicit completion tracker can use `PendingUntilFirstData` for measurement to more closely match previous behavior. `linkerd-ewma` is still a separate crate because we feed it a penalty value rather than a measured RTT, and since Tower's `RttEstimate` is private (at the moment) and advances its decay clock on read, it can't accept an injected observation nor be read under a shared lock. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…fault Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed · 2026-06-05T11:40:28Z

Addressed quite a few comments with a large refactoring. The ewma should now be way closer to Tower's, and the load biaser is now using a single RTT ewma, retry-after/grpc-pushback hints now behave monotonically, the measurement is taken by default when the first data frame arrives, and we use a u32 instead of a float in the config allowing us to derive Hash/Eq.

I think this may close a few of the open PR's against this one, @cratelyn. Probably a good moment to take another look?

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

The gRPC A6 spec defines grpc-retry-pushback-ms as an i32. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

We can now trim our tokio flags and drop the tokio-test dependency. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

penalty_ms on SharedState is millisecond. Remove references to types that are not integrated yet. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Conveys meaning without coupling type nor constant. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a test exercising the case of a sample at or below the still undecayed peak. It should not replace the peak, but compute the value blending in the new sample. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Ensure that add() discards a sample whose timestamp is at or before the stored one. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

A hint below the base penalty such as 0 records a low effective RTT and can make a failing endpoint look healthier than it should. Ensure a failure's recorded measurement is at least the base penalty, so that retry hints take effect only when they exceed that penalty. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test_rtt_tracked_after_request resolved instantly under paused time and only checked that the RTT moved minimally. Drive a request that takes a measurable delay and assert the recorded RTT reflects it. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Existing tests raised the pending count with disabled handles or a single request. Try now with two concurrent requests and assert the strong count reports two pending, then assert the count falls when they resolve. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed · 2026-06-08T13:26:10Z

I think most open threads are either stale or addressed with the latest changes.

cratelyn · 2026-06-08T14:39:22Z

Addressed quite a few comments with a large refactoring. The ewma should now be way closer to Tower's, and the load biaser is now using a single RTT ewma, retry-after/grpc-pushback hints now behave monotonically, the measurement is taken by default when the first data frame arrives, and we use a u32 instead of a float in the config allowing us to derive Hash/Eq.

I think this may close a few of the open PR's against this one, @cratelyn. Probably a good moment to take another look?

i've closed the following:

adleong · 2026-06-08T18:18:20Z

+                // A transport error (connection refused, reset) or cancellation with
+                // no response to classify. Record its elapsed time.
+                drop(handle);


I notice that this seems to be a deliberate difference from the tower implementation. But is there a functional difference? In the tower case, if the inner Future fails, then the TrackCompletionFuture will also fail and presumably be dropped, dropping the held Handle along with it. So what's the intention here as it differs from Tower?

No functional difference. This is made explicit here because conceptually it's where we want the measurement to take place, but will be dropped anyway right before we return if we don't run drop at this point.

There is a subtle timing difference, where arguably we do the right thing by first measuring then constructing the retval, with the implicit case doing the reverse, but this is negligible for most intents and purposes.

adleong · 2026-06-08T18:26:52Z

+    /// Records the measured elapsed time, then prevents the drop from recording
+    /// again.
+    fn record_elapsed(&mut self, now: Instant) {
+        if self.enabled {


I think we could clean up the need for protecting against double-recording here and simplify by having the Handle struct hold a penalty: Option<Duration>. This way, when the completion Future receives a failure Response, it can set the penalty into the Handle and then drop it.

When the Handle is dropped, it can record the stored penalty, if it is Some and the elapsed time otherwise.

That's way cleaner, but behavior is different. It would penalize endpoints when the completion tracker finishes, not when we know that the endpoint should be penalized.

This is not an issue in the normal case, but it would allow a struggling endpoint returning a 429/5XX status and a very slow body to defer the penalty while the balancer keeps considering the endpoint healthy, meanwhile routing traffic to it. In such a scenario it's possible or even likely the endpoint will keep struggling (perhaps even more so) with more traffic, while still not having returned the initial data frame. During that time window until the first penalty lands, which could be several seconds, the endpoint receives traffic that should have been directed elsewhere, because the growth of the load function is minimal compared to a penalty.

If we conclude that it's best to just defer this signal to the time of completion then I agree we can clean this up nicely, but I think the current behavior is useful and a good trade-off at the cost of this boolean and a bit more code.

If we use Option<Handle> as the H type, we can drop the Handle itself so that the value is recorded immediately and pass None into the completion tracker.

cratelyn · 2026-06-08T18:51:00Z

my only remaining piece of blocking feedback is #4537 (comment).

The inject_rtt test helper was using add_peak, which replaces the RTT estimate when the measurement exceeds the current decayed projection. The EWMA is seeded with default_rtt (0.1s in tests), so injecting a lower value such as 0.05 left the estimate near 0.1. Reset the estimate instead, and also assert the actual values in the comments. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed added 5 commits May 21, 2026 20:25

unleashed requested a review from cratelyn May 21, 2026 18:36

unleashed requested a review from a team as a code owner May 21, 2026 18:36

raykroeker reviewed May 22, 2026

View reviewed changes

cratelyn assigned unleashed May 22, 2026

cratelyn reviewed May 22, 2026

View reviewed changes

cratelyn previously approved these changes May 22, 2026

View reviewed changes

cratelyn reviewed May 22, 2026

View reviewed changes

Comment thread linkerd/load-biaser/src/lib.rs Outdated

cratelyn reviewed May 22, 2026

View reviewed changes

Comment thread linkerd/load-biaser/src/lib.rs Outdated

unleashed and others added 2 commits May 26, 2026 13:40

Update linkerd/http/classify/Cargo.toml

342f7d9

Co-authored-by: katelyn martin <kate@buoyant.io>

Update linkerd/load-biaser/Cargo.toml

11ee334

Co-authored-by: katelyn martin <kate@buoyant.io>

unleashed mentioned this pull request May 26, 2026

feat(policy): support load-bias and retry-after balancer annotations linkerd/linkerd2#15317

Open

unleashed added 8 commits May 26, 2026 20:52

build(ewma): disable default tokio features

6ed442b

The crate only uses tokio::time, so disable the default feature set to avoid pulling unnecessary features into the dependency declaration. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(classify): use GRPC_RETRY_PUSHBACK_MS constant in tests

b74b52a

Replace raw string literals with the module-level constant for consistency with how HTTP tests use http::header::RETRY_AFTER. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

fix(ewma): fix typo in test comment

ae6de47

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): add #[must_use] to LoadBiaser::new

3023422

Consistent with Ewma::new which already has this attribute. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed added 5 commits May 28, 2026 11:46

test(load-biaser): add tests for extended gRPC status classification

12adaca

Ensure only those gRPC status codes indicating server-side errors inject penalties. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

cratelyn mentioned this pull request Jun 4, 2026

refactor(load-biaser): replace pending with arc strong count #4556

Closed

unleashed added 6 commits June 5, 2026 13:33

fix(ewma,http-classify): address feedback

d7c03b5

- Drop unused add_rate, last_update - Correct MIN_DECAY enforcement comment - Note on ignoring negative do-not-retry pushbacks Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): measure RTT to first response data frame by de…

2589a46

…fault Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): store the failure penalty as integer milliseconds

ee9e722

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): name the default RTT and decay durations

48e1cc3

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

docs(load-biaser): name the gRPC status codes in the failure classifier

163df42

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

unleashed added 14 commits June 5, 2026 20:00

docs(ewma): reword stale crate comment

761a019

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

fix(http-classify): parse grpc pushback as i32

73ce450

The gRPC A6 spec defines grpc-retry-pushback-ms as an i32. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): consolidate mocks into MockService

9e19b9a

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): apply ResponseFailureHint feedback

a117373

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(load-biaser): remove unused get_ref

e3f47cb

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(load-biaser): assert load values in pending test

b19fa51

Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

build(load-biaser): drop orphaned test deps

e212159

We can now trim our tokio flags and drop the tokio-test dependency. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

docs(load-biaser): correct stale comments

c3bc7c6

penalty_ms on SharedState is millisecond. Remove references to types that are not integrated yet. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

refactor(ewma): use is_infinite instead of f64::INFINITY

841abfa

Conveys meaning without coupling type nor constant. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(ewma): test add_peak when providing a lower measurement

cfad958

Add a test exercising the case of a sample at or below the still undecayed peak. It should not replace the peak, but compute the value blending in the new sample. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

test(ewma): add() ignores old timestamp measurements

1d592a9

Ensure that add() discards a sample whose timestamp is at or before the stored one. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

adleong approved these changes Jun 8, 2026

View reviewed changes

cratelyn approved these changes Jun 9, 2026

View reviewed changes

Conversation

unleashed commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raykroeker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cratelyn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

unleashed commented Jun 5, 2026

Uh oh!

unleashed commented Jun 8, 2026

Uh oh!

cratelyn commented Jun 8, 2026

Uh oh!

adleong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

unleashed Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

adleong Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

unleashed Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adleong Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cratelyn commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

unleashed commented May 21, 2026 •

edited

Loading

unleashed Jun 9, 2026 •

edited

Loading